AEP v2.0.0 — autonomy loop (G2–G7 + full_auto) by memorysaver · Pull Request #11 · memorysaver/agentic-engineering-patterns

memorysaver · 2026-06-15T16:22:44Z

Summary

Implements the loop-engineering autonomy gaps from docs/research/loop-engineering-autonomy-gap.md (G2–G7) plus the A1 full_auto master switch. Builds on v1.8.0 (claude-team removed → native-bg-subagent default + post-spawn liveness probe). Every new capability defaults to human-in-the-loop; autonomy is opt-in via topology.routing flags. Version bumped to 2.0.0.

Research + design lineage is included in the branch (docs/research/loop-engineering-autonomy-gap.md, docs/research/g4-dogfood-validation-design.md) and decisions were taken interactively (G1 rejected; full-auto kept opt-in; grouped_change kept as the one documented exception to one-subagent-per-story).

What's in it

Gap	Change
G2	Change-strategy recovery ladder (`gen-eval/references/recovery-ladder.md`) wired into build Phase 5 + autopilot tick ④ — same-fix → re-ground → fresh generator → decompose before the human gate.
G3	Visual Design evaluator dimension (gen-eval scoring + evaluator contract); multimodal on both hosts.
G4a	Post-merge guard (`autopilot/references/post-merge-guard.md` + tick ③.5): deploy-health monitoring, conservative `auto_revert` (default off).
G4b	Host-aware dogfood (`executor/references/dogfood-validation.md`): Claude→agent-browser, Codex→native browser/computer-use or Playwright; config-first URL with CI fallback.
G5	Telemetry-driven reflect (`reflect/references/telemetry-ingestion.md`): auto-ingestion + quantitative outcome auto-eval.
G6	New `/aep-watch` skill — self-feeding work discovery (registered in marketplace.json).
G7	Loop hygiene unified on `--max-turns`.
A1	`topology.routing.full_auto` (default false) master switch over the strategic human gates.

Safety posture

Defaults preserve current human-gated behavior everywhere; full_auto / auto_revert / auto_outcome_eval / watch.auto_create are explicit opt-ins.
Orchestrator boundary intact (signals/CI/gh only; no workspace-code reads, no gh pr merge from main).
native-bg-subagent + mandatory post-spawn liveness probe on every spawn path; one-launch=one-subagent=one-story invariant explicit (grouped_change is the documented exception).

Process

Built via parallel sub-agents (new files + per-file wiring), then a design-review subagent pass; its findings (1 blocker + 5 should-fix + nits) are all addressed in fix(aep-v2): address design review — notably authoring telemetry-ingestion.md in the canonical _shared/references/ (the build-generated copy had been wiped), plus state-schema guard_state / escalation-enum / recovery_rung registration and doc-count fixes.

Verification

bash scripts/build-skills.sh --check → in sync
product-context schema + marketplace.json parse clean
lefthook (oxlint/oxfmt/skills-build) green on every commit

🤖 Generated with Claude Code

Web research on loop engineering (5 building blocks, ReAct, Ralph loop) mapped against current AEP workflow. Scorecard + gap classification (G1 fresh-context, G2 recovery ladder, G4 post-merge guard, G5 telemetry reflect, G6 self-feeding discovery, G7 hygiene) with priority ordering. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

All 7 gap-fill methods (G1-G7) confirmed cross-host compatible via the executor abstraction. Resolved two caveats: G3 visual evaluator (Codex confirmed multimodal), G7 unifies on --max-turns (drop codex-only token_budget as primary). G1 standardizes on exec/headless one-shot per task to avoid nesting limits. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Spawn granularity in AEP is the story (one worker per story per round); deliberately not subdividing into per-task fresh contexts. G1 moved to a "Rejected" record with rationale; scorecard, gap buckets, priority, and compatibility tables updated. Gaps now G2-G7 (6 methods). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Post-deploy staging/prod validation with host-aware method selection: Claude Code auto-detects agent-browser; Codex uses native in-app browser+computer-use (desktop) or Playwright scripts (headless codex-exec, since computer-use is desktop-only). URL resolution = config first, CI fallback. Integration: upgrade Phase 6 + new post-deploy step. Issues auto-create stories via reflect classifier (links G6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ood, telemetry reflect, self-feeding watch, visual eval, full-auto switch Implements the retained loop-engineering gaps (G2–G7) plus the A1 full-auto master switch, all defaulting to human-in-the-loop (opt-in only). - G2 recovery ladder: gen-eval/references/recovery-ladder.md; build Phase 5 and autopilot tick ④ climb same-fix → re-ground → fresh native-bg-subagent → decompose before the eval_not_converging human gate. - G4 host-aware dogfood + post-merge guard: executor/references/dogfood-validation.md (dogfood_method()/target_url(), Claude=agent-browser, Codex=native/Playwright), autopilot/references/post-merge-guard.md + tick Step ③.5; build Phase 6 host-aware; on-issue → reflect story; hard regression → conservative auto_revert (default off). - G5 telemetry reflect: reflect/references/telemetry-ingestion.md; reflect Step 1 auto-ingestion + Step 2.75 quantitative outcome auto-eval; tick layer-completion. - G6 self-feeding discovery: new /aep-watch skill (registered in marketplace.json). - G3 visual evaluator: Visual Design dimension in gen-eval scoring + evaluator contract. - G7 loop hygiene: unified --max-turns budget; cap = possibly-unsolvable. - A1 full_auto master switch (default false) gates strategic pauses; config keys added to product-context schema (all 3 templates). Quick-reference updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- B1: author telemetry-ingestion.md in canonical _shared/references/ (was created in a build-generated dir and wiped by build-skills.sh); rebuild materializes it into reflect/ + watch/ — G5 + watch ingestion now resolve. - S1: add guard_state entry to autopilot state-schema (post-merge-guard idempotency). - S3: register post_merge_regression in the escalation type enum. - S2: document recovery_rung in eval-protocol status.json fields. - S4: schema health_signals example ci → ci_status (matches the guard's key). - S5: skill count 16 → 17 in README + orientation; add /aep-watch to orientation table. - N1: brief Codex dogfood recipe pointer in codex-native.md. - oxfmt markdown reformatting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…coverage guard) Closes the v2 telemetry gap: consumers shipped without a way to decide/wire sources. - Coverage rule + coverage_check() helper in telemetry-ingestion.md (canonical _shared/references): a source is needed iff a quantitative success_metric or health_signal requires it. - /aep-map gains a Telemetry Binding step (the decision owner): bind each needed signal to a detected/declared source via metric_map; flag the unmeasurable. - /aep-scaffold audit detects the observability stack (Sentry/Datadog/PostHog/ OTel/health endpoint) → candidate telemetry_sources. - /aep-watch (Step 0 precondition), /aep-reflect Step 2.75, and post-merge guard run coverage_check() and BLOCK the auto path when the map binding is incomplete ("run /aep-map observability step") — never silently no-op. - schema documents telemetry_sources[].metric_map + the coverage rule. Folded into the unreleased v2.0.0 (PR #11). oxfmt + build-skills in sync. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

memorysaver · 2026-06-15T17:06:11Z

Added: telemetry source determination (commit 94c76a2)

Closes the gap where v2's telemetry consumers (G5 reflect auto-eval, G6 /aep-watch, G4a post-merge guard) shipped with no way for a project to decide/wire telemetry_sources.

Hybrid rule (telemetry-ingestion.md §1.5): a source is needed iff some quantitative success_metric or health_signal requires it.
/aep-scaffold audit detects the observability stack → candidate sources.
/aep-map gains a Telemetry Binding step (the decision owner) — binds each needed signal to a source via metric_map; flags the unmeasurable.
Shared coverage_check() → /aep-watch (Step 0), /aep-reflect Step 2.75, and the post-merge guard block the auto path when the binding is incomplete ("run /aep-map observability step") — never silently no-op.

Folded into the unreleased v2.0.0. Passed a focused design-review (no blockers; fixed a dangling /aep-onboard detection claim — onboard is tooling-only).

memorysaver and others added 8 commits June 15, 2026 23:54

chore: release v2.0.0 (autonomy loop — G2–G7 + full_auto)

c43288d

memorysaver merged commit 071f98c into main Jun 15, 2026
2 checks passed

memorysaver deleted the feat/aep-v2-autonomy branch June 15, 2026 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AEP v2.0.0 — autonomy loop (G2–G7 + full_auto)#11

AEP v2.0.0 — autonomy loop (G2–G7 + full_auto)#11
memorysaver merged 8 commits into
mainfrom
feat/aep-v2-autonomy

memorysaver commented Jun 15, 2026

Uh oh!

memorysaver commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

memorysaver commented Jun 15, 2026

Summary

What's in it

Safety posture

Process

Verification

Uh oh!

memorysaver commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant